feat(backup): add global lock to prevent concurrent backup OOM crashes#474
Merged
mrrobot47 merged 4 commits intoEasyEngine:developfrom Dec 26, 2025
Merged
Conversation
Add proper shell argument escaping to prevent command injection in rclone operations: - rclone size --json (line 995) - rclone lsf --dirs-only (line 1172) - rclone copy in rclone_download() (line 1251) - rclone copy in rclone_upload() (line 1280) - rclone lsf after upload (line 1292) Note: rclone purge commands in cleanup_old_backups() and rollback_failed_backup() were already properly escaped. This is a defense-in-depth measure as paths are derived from site_url in the database, but protects against potential command injection if malicious data were to be inserted.
…kups Implement flock()-based serialization to ensure only one backup runs at a time. This prevents OOM killer (exit code 137) when multiple backups are triggered simultaneously by EasyDash, as each backup calculates available RAM independently. Problem: When 5 backups trigger at once, each checks available RAM (e.g., 8GB) and allocates buffers accordingly. Total memory = 5 × 8GB = 40GB → OOM killer. Solution: Global lock ensures backups run sequentially. Each backup gets accurate RAM reading and full system resources. Implementation details: - Uses PHP flock() for atomic, race-condition-free locking - Waiting backups poll every 30 seconds with status messages - Maximum wait time of 2 hours before timeout (error code 5001) - Lock file stores current backup site URL and PID for debugging - Shutdown handler ensures lock release on any exit (error, crash, kill) - release_global_backup_lock() is idempotent (safe to call multiple times) Lock file location: EE_ROOT_DIR/services/backup-global.lock Error codes: - 5001: Timeout waiting for another backup to complete - 5002: Cannot create backup lock file
Move backup-global.lock from EE_ROOT_DIR/services/ to EE_BACKUP_DIR/ to keep all backup-related files in one location. Lock file location: EE_BACKUP_DIR/backup-global.lock
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a global file-based locking mechanism to serialize concurrent site backups and prevent OOM crashes, while also improving security by escaping shell arguments in rclone commands.
Key Changes:
- Implements flock-based global backup lock with configurable 24-hour timeout and 60-second polling interval
- Adds escapeshellarg() to all rclone command invocations to prevent shell injection vulnerabilities
- Introduces new error codes (5001, 5002) and ERROR_TYPE_LOCK constant for lock-related failures
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Error code 5001 was already used for PHP Fatal Errors in dash_shutdown_handler. Changed lock timeout to 5003 to avoid collision.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
escapeshellarg()to all rclone shell commands to prevent shell injection vulnerabilitiesProblem
When EasyEngine Dashboard triggers
ee site backupfor multiple sites simultaneously, each backup process independently checks available RAM and allocates memory buffers. With 5+ concurrent backups on an 8GB server, total memory allocation exceeds available RAM, causing the OOM killer to terminate processes (exit code 137).Solution
Implement a global backup lock using
flock():Lock file location:
EE_BACKUP_DIR/backup-global.lockError Codes
5002: Cannot create backup lock file5003: Timeout waiting for another backup to completeNotes
flock()may not work reliably on NFS or other network filesystems;EE_BACKUP_DIRshould be on a local filesystem--listflag returns early without acquiring the lock (read-only operation)